Learning to Filter Junk E-Mail from Positive and Unlabeled Examples
نویسنده
چکیده
We study the applicability of partially supervised text classification to junk mail filtering, where a given set of junk messages serve as positive examples while the messages received by a user are unlabeled examples, but there are no negative examples. Supplying a junk mail filter with a large set of junk mails could result in an algorithm that learns to filter junk mail without user intervention and thus would significantly improve the usability of an e-mail client. We study several learning algorithms that take care of the unlabeled examples in different ways and present experimental results.
منابع مشابه
Prologue: A machine learning sampler
Y OU MAY NOT be aware of it, but chances are that you are already a regular user of machine learning technology. Most current e-mail clients incorporate algorithms to identify and filter out spam e-mail, also known as junk e-mail or unsolicited bulk e-mail. Early spam filters relied on hand-coded pattern matching techniques such as regular expressions, but it soon became apparent that this is h...
متن کاملE-mail Filtering Tool
As e-mail becomes on of the most widely used methods of communication, with it’s ease of use, speed and low cost it has also become the target of advertisers. Just like “snail” mail, e-mail has become prone to junk e-mail but unlike “snail” mail the cost of distributing unsolicited junk e-mail to vast numbers of people is relatively cheap. This has lead to the desire for these unwanted e-mails ...
متن کاملAlgorithm of E-mail Classification Based on Automatic Adapting for User
E-mail classification is an effective method to manage, improve process efficiency and filter junk mail. The extraction of E-mail characteristic is the key problem of exactness classification. In order to make the classification has a more distinct division characteristic words, IDF (Inverse document frequency) is used to epurate further the characteristic. The procedure which users deal with E...
متن کاملPositive and Unlabeled Examples Help Learning
In many learning problems, labeled examples are rare or expensive while numerous unlabeled and positive examples are available. However, most learning algorithms only use labeled examples. Thus we address the problem of learning with the help of positive and unlabeled data given a small number of labeled examples. We present both theoretical and empirical arguments showing that learning algorit...
متن کاملA Survey on Various Classifiers Detecting Gratuitous Email Spamming
Email becomes the major source of communication these days. Most humans on the earth use email for their personal or professional use. Email is an effective, faster and cheaper way of communication. The importance and usage for the email is growing day by day. It provides a way to easily transfer information globally with the help of internet. Due to it the email spamming is increasing day by d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004